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Abstract 

Principal component analysis (PCA) is an unsu¬ 
pervised method for learning low-dimensional fea¬ 
tures with orthogonal projections. Multilinear PCA 
methods extend PCA to deal with multidimensional 
data (tensors) directly via tensor-to-tensor projec¬ 
tion or tensor-to-vector projection (TVP). However, 
under the TVP setting, it is difficult to develop an 
effective multilinear PCA method with the orthog¬ 
onality constraint. This paper tackles this problem 
by proposing a novel Semi-Orthogonal Multilin¬ 
ear PCA (SO-MPCA) approach. SO-MPCA learns 
low-dimensional features directly from tensors via 
TVP by imposing the orthogonality constraint in 
only one mode. This formulation results in more 
captured variance and more learned features than 
full orthogonality. For better generalization, we 
further introduce a relaxed start (RS) strategy to 
get SO-MPCA-RS by fixing the starting projection 
vectors, which increases the bias and reduces the 
variance of the learning model. Experiments on 
both face (2D) and gait (3D) data demonstrate that 
SO-MPCA-RS outperforms other competing algo¬ 
rithms on the whole, and the relaxed start strategy is 
also effective for other TVP-based PCA methods. 


1 Introduction 


Principal component analysis (PCA) is a classical unsu per- 
vised dimensionality reduction method i Jolliffe, 2002) . It 
transforms input data into a new feature space of lower di¬ 
mension via orthogonal projections, while keeping most vari¬ 
ance of the origina l data. PCA is widely used in areas such as 
data compression iKusner et ai, 2014) , comput er vision | Ke 
and Sukthankar, 2004], and patte rn recognition I Anaraki and 
Hughes, 2014[ ^ng et ai, 2014| . 

Many real-world data are mul ti-dimensional, in the form 
of tensors rather than vectors | |Kolda and Bader, 2009] . 
The number of dimensions of a tensor is the order and 
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each dimension is a mode of it. For example, gray images 
are second-order te nsors (matrices) a nd video sequences are 
third-order tensors | Lu et ai, 2013| . Tensor data are also 
common in applications such as data center monitoring, so¬ 
cial n etwork analysis, and network forensics I Faloutsos et al, 
2007). However, PCA on multi-dimensional data requires re¬ 


shaping tensors into vectors first. This vectorization often 
leads to breaking of original data structures, more complex 
model with lots of parameters, and high computational and 
memory demands )Lu et ai, 2013) . Many researchers ad¬ 
dress this problem via multilinear extensions of PCA to deal 
with tensors directly, and there are two main approaches. 

One approach is based on Tensor-to-Tensor Projec¬ 
tion (TTP) that learns low-dimensional tensors from high- 
dimensional tensors. The two-dimensional PCA (2DPCA) 


I Yang et ai, 20041 is probably the first PCA extension to deal 


with images without vectorization. The generalized low rank 


approximation of matric es (GLRAM) I Ye, 2005) and the gen¬ 
eralized PCA (GPCA) I Ye et al., 2004) further generalize 


2DPCA from single-sided projections to two-sided projec¬ 
tions via reconstruction error minimization and variance max- 
imization, respec tively. Concurrent subspace anal ysis (CSA) 


I Xu et ai, 20051 and multilinear PCA (MPCA) | Lu et al., 
2008) extend GLRAM and GPCA to general higher-order 


tensors, respectively. 

Another approach is based on Tensor-to-Vector Projec¬ 
tion (TVP) that learns low-dimensional vectors from high¬ 
dimensional tensors in a su ccessive way. The tensor rank- 
one decomposition (TROD) I Shashua and Levin, 2001) min¬ 


imizes reconstruction error via (greedy) successive residue 
calculation. Th e uncorrelated multilinear PCA (UMPCA) 
]Luef ai, 2009) maximizes variance with the zero-correlation 
constraint, following the successive derivation of PCA. How¬ 
ever, the number of features that can be extracted by UMPCA 
is upper-bounded by the lowest mode dimension. For exam¬ 
ple, for a tensor of size 300x200x3, UMPCA can only ex¬ 
tract three features, which have very limited usage. 


iHua et al, 2007 

Kokiopoulou and Saad, 2007 

Gao et al.. 

20131, tensor decomposition IKolda, 2001|, and low-rank 

tensor approximation I 

Edelman et ai, 1998 

Wang et ai. 


TTP-based PCA methods produce orthogonal projection vec¬ 
tors in each mode. However, none of the existing TVP-based 





































































PC A methods derive orthogonal projections. Our study found 
that it is indeed ineffective to impose full orthogonality in all 
the modes for TVP-based PCA, due to low captured variance 
and limited number of extracted features. 

In this paper, we present a new TVP-based multilin¬ 
ear PCA algorithm, Semi-Orthogonal Multilinear PCA (SO- 
MPCA) with Relaxed Start, or SO-MPCA-RS, to be detailed 
in Sec. 3. There are two main contributions: 


• We propose a novel SO-MPCA approach to maxi¬ 
mize the captured variance via TVP with orthogonal¬ 
ity constraint in only one m ode, which is called semi¬ 
orthogonality according to i Wang et ah, 2015) . The 
semi-orthogonality results in more captured variance 
and more learned features than full-orthogonality. For 
the same tensor of size 300x200x3 discussed ear¬ 
lier, SO-MPCA can extract 300 features while full- 
orthogonal multilinear PCA can only extract three fea¬ 
tures (similar to UMPCA). 

• We introduce a Relaxed Start (RS) strategy to get SO- 
MPCA-RS by hxing the starting projection vect ors for 
better generalization lAbu-Mostafa et al., 20121. This 
strategy constrains the hypothesis space to a smaller set, 
leading to increased bias and reduced variance of the 
learning model. The experimental results in Sec. 4 show 
that SO-MPCA-RS outperforms other competing PCA- 
based methods on the whole. In addition, this is a new 
strategy for tensor-based algorithms and we show its ef¬ 
fectiveness for other TVP-based PCA methods. 


In the following, we cover the necessary background first. 


concisely as {u^^\n = 1, • • • , or 

y = A’x^^i{u("),n=l,-.. (3) 

where the pih component of y is obtained from the pih EMP 
as: 

Vp = y{p) = A' Xi • • • Xjv = A' 

(4) 


3 SO-MPCA with Relaxed Start 

This section presents the proposed SO-MPCA-RS by first for¬ 
mulating the SO-MPCA problem, then deriving the solutions 
with a successive and conditional approach, and finally intro¬ 
ducing the relaxed start strategy for better generalization. 


3.1 Formulation of Semi-Orthogonal MPCA 

We define the SO-MPCA problem with orthogon ality con 
straint in only one mode, i.e., semi-orthogonality I Wang et 
g/., 20151 , as follows: 

The SO-MPCA problem: A set of M tensor data sam 
pies {Xi, ,Xm} are available for training. Each 

sample Xm S 
the tensor space 
mode dimension and denotes the Kronecker product. SO 

MPCA considers a TVP, which consists of P EMPs S 
= 1, • • • that projects the input tensor 

space ... 0 into a vector subspace R^, i.e., 

Vm = = 1,-• • , (5) 


X/ 2 X ■x/jv viewed a point in 

0 R^”, where /„ is the n- 


2 Background 

Notations and basic op erations: We follow the notations in 
I Lathauwer et al., 2000| to denote vectors by lowercase bold¬ 
face letters, e.g., x; matrices by uppercase boldface letters, 
e.g., X; and tensors by calligraphic letters, e.g., X. We de¬ 
note their elements with indices in parentheses, and indices 
by lowercase letters spanning the range from 1 to the up¬ 
percase letter of the index, e.g., n = 1, • • • ,N. An Wth- 
order tensor A G addressed by N indices {in}- 

Each in addresses the n-mode of A. The n-mode product 
of an TVth-order tensor A by a vector u G R^", denoted by 
B = A Xn u^, is a tensor with entries: 


^{ili ' ' ' ;^n— ^ ^ A(,il, ‘ ‘ ‘ , i]S[f\l(^in 


( 1 ) 

Tensor-to-vector projection: Elementary multilinear pro¬ 
jections (EMPs) are the building blocks of a TVP. We de¬ 
note an EMP as consisting of one 

unit projection vector in each mode, i.e., || 11= 1 for 

n = 1, • • • ,N, where || • |j is the Euclidean norm for vectors. 
It projects a tensor X G RAxG x - x/iv ^ scal ar y through 
the N unit projection vectors as |Lu et al, 2013): 


y = X Xi X 2 U' 


.(2)" 


Xn u 


{Nf 


The TVP of a tensor A to a vector y G 


( 2 ) 

consists of P 


EMPs {u^ 


, Up A,p=l,--- , P, which can be written 


for m = 1, • • • , M. The objective is to find a TVP to maxi¬ 
mize the variance of the projected samples in each projection 
direction, subject to the orthogonality constraint in only one 
mode, denoted as the g-mode. The variance is measured by 
the total scatter Sp defined as: 

M 

^p — 'y ] {Ump ~ Vp) ) ( 6 ) 

m—1 


where y^p = x^^^ {4" }, and j/p = ^ Em Vmp- 

In other words, the objective of SO-MPCA is to obtain the 
P EMPs, with the pth EMP determined as: 


= 1, • • • ,W} = 


M 

arg max 

m—1 


iUmp Vp) : (J) 


s.t. 4") = 1 


p 


for n = 1, - ■ ■ ,N and (8) 

=0 for p > 1 and q = 1, - ■ ■ ,p — 1,(9) 

where the orthogonality constraint (j^ is imposed only in the 
g-mode and there is no such constraint for the other modes 
{n = 1, ■■■, N, n ix). The normalization constraint Q is 
imposed for all modes. 

Bound on the number of featu res: Based on the proof 
of Corollary 1 in | Lu et al., 2009) , we can derive that the 
number of features P that can be extracted by SO-MPCA is 
upper-bounded by the g-mode dimension 7^: P < I^. Since 



















we can choose any n as z/, we have the upper bound of P as 
P < max„ In (i.e., the highest mode dimension). 

Selection of mode Although we are free to choose any 
mode n as z/ to impose the orthogonality constraint (j^, it is 
often good to have more features in practice. Thus, in this 
paper, we choose the mode with the highest dimension as i/: 

z/ = argmax/„, (10) 

n 

such that P = max„ On the other hand, we can 

also obtain a total of features by running SO-MPCA 

N times with iy = !,■■■ ,N. In this paper, we only focus on 
SO-MPCA with ly determined by ( [T0| . 

Semi-orthogonality vs. full-orthogonality: If we im¬ 
pose the orthogonality constraint 0 in all modes, we can 
get Full-Orthogonal Multilinear PCA (FO-MPCA). How¬ 
ever, our study found that FO-MPCA is not effective primar¬ 
ily due to two reasons; 

• Due to the heavy constraints, the variance captured by 
FO-MPCA is quite low, even lower than UMPCA. In 
contrast, SO-MPCA can capture more variance than 
both FO-MPCA and UMPCA. This is illustrated in Fig. 
[T]in Sec. 4. 

• Similar to UMPCA, the number of features that can be 
extracted by FO-MPCA is upper-bounded by the lowest 
mode dimension min„ which can be quite limited. 
For instance, FO-MPCA can extract only three features 
for a tensor of size 300x200x3 while SO-MPCA can 
extract 300 features by choosing v = 1 for the same 
tensor. This can be observed in Fig. [T]as well. 


3.2 Successive Derivation of SO-MPCA 


To solve the SO-MPCA problem, we follow the successive 
derivation in i Jolliffe, 2002\ Lu et al, 2009| to determine 
EMPs one by one in P steps: 

Step 1 (p = 1): Determine the first EMP {uj"\n = 
1, • • • , A} by maximizing with the constraint ([^. 

Stepp (p = 2,--- ,P): 

Determine the pth EMP n = 1, • • • , N} by max¬ 
imizing Sp with the constraints (|^ and Q. 

Conditional subproblem: In order to obtain the pth EMP 
,n = 1, • • • , N}, we need to determine N vectors. We 
follow the approach of alternating least squares I Harshman, 
19701. Thus, we can only obtain locally optimal solutions 
as in many other tensor-based methods. Eor the pth EMP, 
the parameters of the n-mode projection vector u), ' are esti¬ 
mated one mode by one mode separately conditioned on the 
projection vectors in all the other modes. Assuming the pth 
projection vectors in all but n-mode are given, we project the 
input tensor samples in these {N — 1) modes to o btain the 
partial multilinear projections as in ||Lu et al., 2013): 


Ym' = 


Am X 1 u. 


(1)’ 


X„_1 u. 




■p 


X„+iU^”+^)'^ • • • Xat 


( 11 ) 


where ym^ G This conditional subproblem then be- 

(n) 

comes to determine u), ' that projects the vector samples 


{ymp, rn = 1, • • • ,M} onto a line to maximize the variance 
captured. Then the total scatter matrix corresponding to 
{ymp, TO = 1, • • • , M} becomes: 

M 

m—l 

where - 2- 

wnere yp — z^m=i Ymp- 

in) 

Eor p = 1 (step 1), the solution for uj , where n = 
1, • • • , N, is obtained as the unit eigenvector of asso¬ 
ciated with the largest eigenvalue. 

Eor p > 2, we need to deal with the z^-mode and other 
modes differently. Eor modes other than i/, the solution for 
Up"^, where n = 1, ■■■, N, n i/, is obtained as the unit 

eigenvector of Sp associated with the largest eigenvalue. 

Constrained optimization for z/-mode and p > 2: When 
p > 2, we need to determine Up"^^ by solving the following 
constrained optimization problem: 

= arg max (13) 

s.t. = 1 and ^ =0, g = 1, • • - p — 1. 

We solve this problem by the following theorem: 

Theorem 1. The solution to the problem (0 is the (unit- 
length) eigenvector corresponding to the largest eigenvalue 
of the following eigenvalue problem: 

rMsMuM=AuM, (14) 

where, 

rM= [i,„ - (15) 

9=1 

and I/^ is an identity matrix of size In x In- 


Proof First, we use Lagrange multipliers to transform the 
problem ([T3]l to include all the constraints as: 



P“i r 

9=1 

where A and {p,q,q = 1, - ■ ■ ,p—l}are Lagrange multipliers. 

Then we set the partial derivative of Li, with respect to Up^^^ 
to zero: 


dC, 

dui"''’ 


p-i 


= 2S(")u(") - 2Au(") - E F9<^ = 0. (17) 


9=1 


Premultiplying |l^ by Up^"! , the third term vanishes and' 
get 


2u(")^SMuM - 2AuM^uM = 0 


A = 


( 18 ) 













Algorithm 1 Semi-Orthogonal Multilinear PCA with Re¬ 
laxed Start (SO-MPCA-RS) 


Input: A set of tensor samples {A™ G ^ , m = 

1, • • • ,M}, and the maximum number of iterations K. 

Set!/ = argmaxn 7„. 


2 

3 

4 

5 

6 

7 

8 

9: 

10 : 


Set the first BMP: = 1/ || 1 


for n = 1, • ■ ■ 
,N. 


,N. 


for p = 2 to P do 

Initialize = 1/ || 1 || forn = 1, 

for fc = 1 to 7 f do 
for n = 1 to A do 

Calculate the partial multilinear projection {yml} for 
m — 1, - ■ ■ ,M according to fu). 
if n == V then 


■^(v) 


j(l') 


11 : 

12 : 

13: 

14 

15 

16 
17 


respectively. Then, set to the eigenvector of 
Fp^'^ Sp*^^ associated with the largest eigenvalue. 

else 


Calculate Sp"^ by l|l2jl. Set Up"'* to the eigenvector 
of Sp"^ associated with the largest eigenvalue. 

end if 
end for 
end for 
end for 

Output The TVP {up”\ n = 1, • • ■ , N}p^i . 


(n) 


which indicates that A is exactly the criterion to be maxi¬ 
mized, with the orthogonality constraint. 

Next, a set of {p — 1) equations are obtained by premulti- 


plying (17 1 by Ug ,q = 1, ■ ■ ■ ,p — I, respectively. 


- 2Au('^)^Up 


=0.(19) 

S = 1 


The second term vanishes and the summand in the third term 
is non-zero only for s = q. Thus, we get 

- /ig = 0 ^ (20) 


Substituting < [20| > into (17 1 , we get 


p-i 


2SMuM - 2Au('^) - ^ • 2u('^) = 0 




P-1 


AuM=sMuM-^uMuM (21) 




P-1 


AuM = [Iz„-^uMuM ] SMuM. (22) 

9=1 


Using the definition in ([T5|), (|22|) can be rewritten as: 


= AuM. 


(23) 


Since A is the criterion to be maximized, this maximization is 
achieved by setting to the (unit) eigenvector of 
associated with its cori'esponding largest eigenvalue . □ 


3.3 Relaxed Start for Better Generalization 

When we use SO-MPCA features for classification, we find 
the performance is limited. Therefore, we further introduce 
a simple relaxed start (RS) strategy to get SO-MPCA-RS by 
fixing the first EMP {u^"^, n = 1, • • • , TV} (the starting pro¬ 
jection vectors), without variance maximization. In this pa¬ 
per, we set this starting EMP (for n = 1, • • • , N) to the 
normalized uniform vector 1/ || 1 |1 for simplicity. 

This idea is motivated by the th eoretical studies in Chap¬ 
ter 4 of I Abu-Mostafa et al.,20\2 \ showing that constraining 
a learning model could lead to better generalization. By fix¬ 
ing the first EMP as simple vectors, the following EMPs have 
less freedom due to the imposed semi-orthogonality, which 
increases the bias and reduces the variance of the learning 
model. Thus, the SO-MPCA-RS model has a smaller hypoth¬ 
esis set than the SO-MPCA model. The two algorithms differ 
only in how to determine the first (starting) EMP though the 
following EMPs will all be different due to their dependency 
on the first EMP. 

This relaxed start strategy is not specific to SO-MPCA but 
generally applicable to any TVP-based subspace learning al¬ 
gorithm. We run controlled experiments in Sec. 4 to show 
that it can improve the performance of not only SO-MPCA 
but also TROD and UMPCA. 

Algorithm 1 summarizes the SO-MPCA-RS algorithm[^ 
The SO-MPCA algorithm can be obtained from Algorithm 
1 by removing line 3, changing p = 2 in line 4 to p = 1 and 
setting (p = 1) in line 10 to an identity matrix. 


4 Experiments 

This section evaluates the proposed methods on both second- 
order and third-order tensor data in terms of recognition rate, 
the number of extracted features, captured variance, and con¬ 
vergence. In addition, we also study the effectiveness of the 
relaxed start strategy on other TVP-based PCA algorithms. 

Data0 Eor second-or der tensors, we use the same subset 
of the EERET database I Phillips et al, 2000) as in |Lu et 
\al., 2009) , with 721 face images from 70 subjects. Each face 
image is normalized to 80 x 60 graylevel pixels. Eor third- 
order tensors, we us e a subset of the US E HumanID “Gait 
Challenge” database jSarkar et ai, 200^ . We use the same 
gallery set (731 samples from 7 1 subjects) and p robe A (727 
samples from 71 subjects) as in | Lu et ai, 2009| , and we also 
test probe B (423 samples from 41 subjects) and probe C (420 
samples from 41 subjects). Each gait sample is a (binary) 
silhouette sequence of size of 32 x 22 x 10. 

Experiment setup: In face recognition experiments, we 
randomly select L — 1, 2, 3,4, 5,6, 7 samples from each sub¬ 
ject as the training data and use the rest for testing. We repeat 
such random splits (repetitions) ten times and report the mean 
correct recognition rates. In gait recognition experiments, we 
follow the standard setting and use the gallery set as the train¬ 
ing data and probes A, B, and C as the test data (so there is 


'Matlab code is available at: http://www.comp.hkbu. 
edu.hk/"haiping/codedata.html 

‘‘Both face and gait data are downloaded from: http: //www. 
dsp.utoronto.ca/~haiping/MSL.html 
























Table 1: Face recognition rates in percentage (mean ± std) by the nearest neighbor classifier on the FERET subset. The top 
two results are highlighted with bold fonts and indicates that no enough features can be extracted. 


L 

P 

PCA 

CSA 

MPCA 

TROD 

UMPCA 

SO-MPCA 

SO-MPCA-RS 

TROD-RS 

UMPCA-RS 



2.60±0.66 

3.87±1.02 

2.52±0.76 

2.70±0.44 

5.98±2.65 

2.73 ±0.69 

6.85±1.44 

2.63±0.82 

6.04±2.00 


5 

15.12±1.31 

11.90±1.25 

16.65±1.82 

15.88±1.20 

23.23±4.49 

20.06±2.34 

27.27±2.36 

16.68±1.50 

24.78±4.76 


10 

22.69±2.21 

21.35±2.76 

22.24±1.94 

21.52±2.83 

31.83±5.17 

28.77±2.72 

36.34±3.56 

21.86±3.03 

33.16±5.36 

1 

20 

27.62±2.57 

26.16±2.38 

27.16±1.47 

26.30±2.49 

35.94±5.65 

31.94±2.95 

40.32±3.40 

26.51±1.97 

36.65 ±5.46 


50 

31.38±2.58 

31.37±1.92 

31.29±1.71 

29.63±2.21 

36.14±5.73 

32.33±2.78 

40.48±3.09 

29.80±1.48 

37.05 ±5.46 


80 

- 

31.95±1.84 

32.17±2.09 

31.14±2.42 

- 

32.26±2.71 

40.41±3.09 

31.21±1.94 

- 



2.69±0.73 

3.36±0.54 

2.63±0.55 

2.65±0.77 

7.28±2.44 

2.69±0.46 

7.97±1.10 

2.81±0.79 

6.82±1.56 


5 

20.17±1.25 

15.15±1.03 

21.53±0.90 

21.34±1.56 

26.90±5.23 

24.34±1.59 

33.82±1.45 

21.62±1.93 

29.19±3.55 


10 

32.03±2.49 

29.45±1.95 

28.04±1.69 

30.05±1.70 

40.17±6.76 

36.94±2.58 

46.63±1.95 

30.83±1.93 

44.01±3.22 

2 

20 

39.07±1.87 

37.07±2.27 

38.86±2.12 

36.87±2.37 

44.51±6.54 

41.70±2.48 

52.19±2.11 

37.64±2.19 

48.67±3.57 


50 

43.86±2.53 

44.61±2.34 

44.54±2.74 

42.67±2.16 

45.47±6.65 

42.24±2.39 

52.22±1.73 

42.99±2.28 

49.07±3.61 


80 

45.28±2.39 

45.82±2.76 

46.02±2.67 

44.46±2.53 

- 

42.20±2.39 

52.19±1.80 

44.78±2.48 

- 



2.72±0.45 

4.07±0.80 

2.25±0.44 

2.95±0.61 

7.42±1.17 

2.56±0.56 

7.55±1.17 

2.74±0.81 

7.34±1.08 


5 

23.89±1.64 

16.58±0.95 

25.95±1.26 

24.48±1.79 

33.86±3.65 

28.30±2.05 

36.93±1.76 

26.14±2.17 

32.68±4.67 


10 

37.20±1.91 

36.05±1.50 

34.91±2.40 

34.83±2.96 

49.39±2.83 

43.31±1.51 

54.38±3.09 

34.64±3.01 

50.55±4.81 

3 

20 

46.05±2.11 

43.87±1.98 

45.48±2.46 

43.37±2.51 

55.83±3.32 

49.49±2.16 

61.25±2.85 

42.99±2.65 

55.89±4.16 


50 

51.35±2.53 

51.60±2.50 

52.00±2.70 

48.92±2.69 

56.42±3.11 

49.80±2.29 

61.08±2.83 

49.47±2.76 

56.38±4.62 


80 

52.66±2.67 

52.84±2.70 

53.31±2.59 

51.17±2.65 

- 

49.77±2.36 

60.98±2.83 

51.45±2.63 

- 



2.68±0.85 

3.92±1.04 

2.22±0.62 

3.11±0.83 

7.96±2.15 

3.13±0.90 

8.34±1.37 

3.08±0.66 

7.03±1.68 


5 

25.26±1.77 

18.93±1.29 

28.71±1.91 

27.37±2.34 

37.66±4.88 

29.61±2.16 

40.25±1.52 

28.25±2.82 

38.84±2.97 


10 

41.54±2.02 

40.39±2.36 

39.43±2.05 

38.82±3.91 

55.10±4.55 

47.10±2.88 

59.30±2.49 

39.25±2.95 

57.30±4.86 

4 

20 

49.34±1.69 

49.39±2.35 

50.18±3.03 

47.57±2.70 

62.40±4.49 

53.85±2.89 

66.60±3.07 

47.73±2.96 

64.08±4.94 


50 

56.85±2.09 

57.51±2.98 

57.48±2.72 

54.58±2.56 

63.13±4.15 

54.56±3.14 

67.03±2.86 

54.42±2.89 

64.85±5.01 


80 

58.16±2.46 

59.05±2.65 

58.91±2.50 

57.30±2.46 

- 

54.47±3.17 

66.89±2.90 

57.48±2.46 

- 



2.91±0.91 

4.37±1.06 

2.72±0.88 

2.59±0.64 

7.41±2.14 

3.07±0.60 

8.38±0.97 

2.96±0.75 

7.20±1.35 


5 

28.95±2.07 

20.75±1.77 

32.99±2.47 

31.75±2.79 

40.78±5.82 

34.20±2.67 

42.35±3.04 

33.45±1.41 

41.67±1.95 


10 

47.06±1.54 

45.77±2.17 

43.29±3.07 

43.80±3.51 

60.49±6.37 

53.45±2.75 

63.23±3.37 

44.69±2.71 

62.88±2.59 

5 

20 

55.66±1.94 

56.01±2.19 

56.79±2.14 

54.47±1.66 

66.90±6.23 

61.40±2.43 

69.97±2.55 

54.64±1.96 

70.19±3.25 


50 

63.91±1.71 

64.58±2.13 

64.37±2.27 

61.54±2.75 

67.71±6.31 

62.26±2.88 

70.70±2.47 

61.51±1.92 

70.81±3.08 


80 

64.61 ±1.67 

65.58±1.92 

65.85±2.02 

64.02±2.40 

- 

62.18±2.83 

70.70±2.45 

64.02±2.03 

- 



2.86±0.89 

3.89±0.67 

2.49±1.01 

2.86±1.01 

9.07±0.83 

2.56±0.74 

8.97±0.97 

2.56±0.99 

7.21±1.35 


5 

30.30±2.17 

21.89±2.04 

33.42±2.52 

33.59±2.63 

42.52±4.99 

35.18±1.32 

43.32±1.82 

35.18±2.52 

44.65±4.14 


10 

48.97±2.96 

49.14±2.57 

45.65±2.85 

45.88±2.97 

63.16±5.32 

56.15±2.36 

65.75±2.76 

47.24±2.55 

66.51±3.10 

6 

20 

58.57±2.84 

58.97±2.60 

59.73±2.96 

57.11±3.22 

70.73±5.39 

64.72±3.11 

74.39±2.79 

57.81±2.18 

74.52±3.10 


50 

66.88±2.31 

67.84±2.48 

67.84±2.63 

64.05 ±2.95 

72.09±5.18 

65.32±2.86 

74.75±2.60 

65.12±2.92 

75.12±2.79 


80 

68.31±2.33 

69.44±2.49 

69.70±2.35 

66.78±2.94 

- 

65.08±2.73 

74.72±2.56 

68.01±2.90 

- 



2.68±1.19 

4.55±1.07 

2.12±1.07 

2.81±0.82 

11.39±1.85 

2.38±1.33 

10.91±1.37 

1.99±1.19 

8.57±1.68 


5 

29.52±1.38 

22.51±1.29 

34.72±2.86 

32.03±2.46 

44.98±5.32 

35.58±2.04 

45.89±2.34 

35.02±1.82 

45.93±1.93 


10 

51.21±2.11 

49.39±3.18 

46.19±2.43 

46.10±2.60 

65.67±5.82 

56.84±2.07 

67.53±2.34 

48.44±3.20 

66.93±2.01 

7 

20 

59.57±2.64 

60.91±2.75 

61.69±2.57 

57.58±2.75 

73.16±4.28 

65.37±2.24 

74.89±1.97 

58.31±2.62 

75.80±2.24 


50 

68.10±2.21 

69.35±1.89 

69.26±2.22 

65.54±2.79 

74.11±4.52 

65.37±2.02 

75.24±2.12 

66.06±3.10 

76.88±1.72 


80 

69.70±2.84 

70.39±1.76 

70.65±1.97 

67.97±2.45 

- 

65.37±2.02 

75.19±2.18 

68.14±2.96 

- 


no random splits/re petitions), and repo rt the rank 1 and rank 
5 recognition rates fSarkar et al, 2005) . 

Algorithms and their settings: We first evaluate SO- 
MPCA and SO-MPCA-RS against five existing PCA-based 
methods: PCA Uolliffe, 2002|, CSA ||Xu et al, 2005|, M PCA 


| Lu et al.j 20081, TROD OShashua and Levin, 20^ , and 

m 


UMPCAI ^u et al, 2009) P| CSA and MPCA produce ten- 


sorial features so they need to be vectorized. MPCA uses 
the full projection. Eor TROD and UMPCA, we use the uni¬ 
form initialization | Lu et al, 2009) . Eor SO-MPCA and SO- 
MPCA-RS, we set the selected mode v = 1 for the maxi¬ 
mum number of features. Eor iterative algorithms, we set the 
number of iterations to 20. All features are sorted accord¬ 
ing to the scatters (captured variance) in descending order for 


^For second-order tensors, CSA and MPCA are equivalent to GLRAM |Ye, 2005) and GPCA jYe et al, 2004|, respectively. 
































classification. We use the Nearest Neighbor Classifier with 
the Euclidean distance measure to classify the top P features. 
We test up to P = 80 features in face recognition and up to 
P — 32 features in gait recognition. The performance of FO- 
MPCA is much worse than SO-MPCA so it is not included in 
the comparisons (except variance study) to save space. 

Face recognition results: Table [T] shows the face recog¬ 
nition results for P = 1,5,10,20,50,80 and L — 

1,2, 3,4, 5, 6, 7, including both the mean and the standard de¬ 
viation (std) over ten repetitions. We highlight the top two re¬ 
sults in each row in bold fonts for easy comparison. Only SO- 
MPCA-RS consistently achieves the top 2 results in all cases. 
Compared with existing methods (PCA, CSA, MPCA, TROD 
and UMPCA), SO-MPCA-RS outperforms the best perform¬ 
ing existing algorithm (UMPCA) by 3.79% on average. 

Furthermore, for larger L = 5, 6, 7, SO-MPCA-RS outper¬ 
forms the other five methods at least by 2.26% on average. 
For smaller L = 1,2,3, SO-MPCA-RS achieves a greater 
improvement of at least 5.28% over existing methods, indi¬ 
cating that SO-MPCA-RS is more superior in dealing with 
the small sample size (overhtting) problem. 

Gait recognition results: Similarly, the gait recognition 
results are reported in Table with the top two results high¬ 
lighted. Again, only SO-MPCA-RS consistently achieves the 
top 2 results in all cases. In rank 1 rate, the best performing 
existing algorithm is PCA, which outperforms SO-MPCA- 
RS by 3.73% on average. While in rank 5 rate, SO-MPCA- 
RS outperforms the best performing existing algorithm (still 
PCA) by 6.76% on average. 

Number of features: In the tables, we use to indicate 
that there are not enough features. For PCA, there are at most 

69 features for face data when L = 1 since there are only 

70 samples for training. UMPCA can only extract 60 or 10 
features for face and gait data, respectively. In contrast, SO- 
MPCA and SO-MPCA-RS (with zz = 1) can learn 80 features 
for face data and 32 features for gait data. 

Feature variauce: We illustrate the variance captured by 


PCA, UMPCA, FO-MPCA, SO-MPCA, and SO-MPCA-RS 
in Fig. [T] for face data with L — I (not all methods are 
shown for clarity). Figure 1(a) shows the sorted variance. 
It is clear that semi-orthogonality captures more variance 
than full-orthogonality, as we discussed in Sec. 3.1. More¬ 
over, both SO-MPCA and SO-MPCA-RS can capture more 
variance than UMPCA, but less than PCA (and also CSA, 
MPCA, and TROD, which are not shown). Though captur¬ 
ing less variance, SO-MPCA-RS achieves better overall clas¬ 
sification performance than other PCA-based methods, with 
results consistently in the top two in all experiments. 


We also show the unsorted captured variance in Fig. 1(b) 


The variance captured by the first (fixed) EMP of SO-MPCA- 
RS is much less than other EMPs, which is not surprising 
since the variance is not maximized. 

Convergence: We demonstrate the convergence of SO- 
MPCA-RS in Fig. I^for face data with L = 1. We can see that 
SO-MPCA-RS converges in just a few iterations. SO-MPCA 
has a similar convergence rate. 

Effectiveness of relaxed start: To evaluate the pro¬ 
posed relaxed start strategy, we apply it to two other TVP- 
based methods, TROD and UMPCA, getting TROD-RS and 




Figure 1; The captured variance on face data with L = 1. 



Number of Iterations 
(a) p = 2 



Number of Iterations 
(b) p = 5 


Figure 2: Illustration of the SO-MPCA-RS algorithm’s con¬ 
vergence performance on the face data with L = 1. 


UMPCA-RS, respectively. We summarize their performance 
in the last two columns of Tables [T]and|^ for face and gait 
recognition experiments, respectively. 

Both tables show that relaxed start can help both TROD 
and UMPCA to achieve better recognition rates. From Table 
TROD-RS improves over TROD by 0.32% and UMPCA- 
RS improves over UMPCA by 2.15% on average. From Table 
TROD-RS achieves 3.03 % improvement over TROD and 
UMPCA-RS achieves 1.07% improvement over UMPCA for 
rank 1 rate. For rank 5 rate, TROD-RS improves 0.94% over 
TROD, and UMPCA-RS improves 5.28% over UMPCA. The 
relaxed start is most effective for our SO-MPCA. SO-MPCA- 
RS has an improvement of 9.97% on face data over SO- 
MPCA. On gait data, SO-MPCA-RS outperforms SO-MPCA 
by 9.56% in rank 1 rate and 17.26% in rank 5 rate on average. 

In addition, SO-MPCA-RS has better face recognition per¬ 
formance than TROD-RS and UMPCA-RS with an improve- 













































Table 2: Rank 1 and rank 5 gait recognition rates in percentage (mean ± std ) by the nearest neighbor classifier on the USF 
subset. The top two results are highlighted with bold fonts and indicates that no enough features can be extracted. 


Rank 

Probe 

p 

PCA 

CSA 

MPCA 

TROD 

UMPCA 

SO-MPCA 

SO-MPCA-RS 

TROD-RS 

UMPCA-RS 



5 

30.99 

22.54 

32.39 

28.17 

39.44 

30.99 

40.85 

18.31 

39.44 



10 

52.11 

43.66 

49.30 

42.25 

57.75 

49.30 

59.15 

33.80 

63.38 


A 

20 

67.61 

57.75 

60.56 

53.52 

- 

54.93 

67.61 

53.52 

- 



32 

71.83 

59.15 

61.97 

60.56 

- 

54.93 

69.01 

64.79 

- 



5 

26.83 

17.07 

24.39 

19.51 

26.83 

29.27 

41.46 

19.51 

39.02 



10 

48.78 

39.02 

46.34 

39.02 

46.34 

53.66 

63.41 

36.59 

51.22 

1 

B 

20 

65.85 

53.66 

58.54 

53.66 

- 

58.54 

65.85 

43.90 

- 



32 

68.29 

60.98 

58.54 

65.85 

- 

60.98 

68.29 

63.41 

- 



5 

12.20 

9.76 

14.63 

4.88 

24.39 

12.20 

14.63 

7.32 

14.63 



10 

29.27 

14.63 

19.51 

14.63 

29.27 

29.27 

29.27 

19.51 

21.95 


c 

20 

34.15 

31.71 

29.27 

21.95 

- 

31.71 

34.15 

24.39 

- 



32 

46.34 

34.15 

29.27 

31.71 

- 

31.71 

39.02 

39.02 

- 



5 

57.75 

56.34 

66.20 

54.93 

73.24 

67.61 

84.51 

42.25 

73.24 



10 

80.28 

74.65 

77.46 

77.46 

83.10 

77.46 

88.73 

59.15 

85.92 


A 

20 

87.32 

80.28 

81.69 

76.06 

- 

80.28 

92.96 

77.46 

- 



32 

87.32 

81.69 

83.10 

78.87 

- 

80.28 

92.96 

81.69 

- 



5 

48.78 

48.78 

53.66 

48.78 

58.54 

63.41 

68.29 

53.66 

58.54 



10 

73.17 

70.73 

70.73 

60.98 

65.85 

70.73 

75.61 

75.61 

68.29 

5 

B 

20 

78.05 

75.61 

78.05 

75.61 

- 

70.73 

80.49 

78.05 

- 



32 

78.05 

73.17 

78.05 

80.49 

- 

70.73 

80.49 

80.49 

- 



5 

51.22 

34.15 

41.46 

36.59 

41.46 

43.90 

56.10 

29.27 

36.59 



10 

53.66 

46.34 

43.90 

48.78 

43.90 

48.78 

70.73 

43.90 

56.10 


c 

20 

65.85 

56.10 

60.98 

48.78 

- 

48.78 

78.05 

46.34 

- 



32 

65.85 

60.98 

58.54 

56.10 

- 

48.78 

78.05 

56.10 

- 


ment of 8.08% and 1.64%, respectively. On gait data, SO- 
MPCA-RS improves the rank 1 recognition rate by 3.03% 
over TROD-RS and 13.25% over UMPCA-RS on average, 
and SO-MPCA-RS improves the rank 5 recognition rate by 
11.07% and 13.73% over TROD-RS and UMPCA-RS. 

These controlled experiments show the effectiveness of re¬ 
laxed start on SO-MPCA and other TVP-based multilinear 
PCA methods (UMPCA and TROD). One possible explana¬ 
tion is that RS increases the bias and reduces the variance of 
the learning model, while further investigation is needed. 

5 Conclusion 

This paper proposes a novel multilinear PCA algorithm under 
the TVP setting, named as semi-orthogonal multilinear PCA 
with relaxed start (SO-MPCA-RS). The proposed SO-MPCA 
approach learns features directly from tensors via TVP to 
maximize the captured variance with the orthogonality con¬ 
straint imposed in only one mode. This semi-orthogonality 
can capture more variance and learn more features than full- 
orthogonality. Furthermore, the introduced relaxed start strat¬ 
egy can achieve better generalization by fixing the starting 
projection vectors to uniform vectors to increase the bias and 
reduce the variance of the learning model. Experiments on 
face (2D data) and gait (3D data) recognition show that SO- 
MPCA-RS achieves the best overall performance compared 
with competing algorithms. In addition, relaxed start is also 


effective for other TVP-based PCA methods. 

In this paper, we studied semi-orthogonality in only one 
mode. A possible future work is to learn SO-MPCA-RS fea¬ 
tures from each mode separately and then do a feature/score- 
level fusion. 
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